AITopics | action localization

Collaborating Authors

action localization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VideoCapsuleNet: A Simplified Network for Action Detection

Kevin Duarte, Yogesh Rawat, Mubarak Shah

Neural Information Processing SystemsFeb-13-2026, 06:17:56 GMT

Wepropose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive.

artificial intelligence, capsule, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A flexible model for training action localization with varying levels of supervision

Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid

Neural Information Processing SystemsFeb-12-2026, 20:33:43 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, supervision, (18 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

8c00dee24c9878fea090ed070b44f1ab-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 07:27:42 GMT

artificial intelligence, machine translation, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.52)

Add feedback

Areall FramesEqual? ActiveSparseLabelingfor VideoActionDetection

Neural Information Processing SystemsFeb-9-2026, 06:49:15 GMT

Wedemonstratethattheproposed approach performs better than random selection, outperforming all other baselines, with performance comparable tosupervised approach using merely 10%annotations.

annotation, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MambaTAD: When State-Space Models Meet Long-Range Temporal Action Detection

Lu, Hui, Yu, Yi, Lu, Shijian, Rajan, Deepu, Ng, Boon Poh, Kot, Alex C., Jiang, Xudong

arXiv.org Artificial IntelligenceNov-25-2025

Abstract--T emporal Action Detection (T AD) aims to identify and localize actions by determining their starting and ending frames within untrimmed videos. Recent Structured State-Space Models such as Mamba have demonstrated potential in T AD due to their long-range modeling capability and linear computational complexity. On the other hand, structured state-space models often face two key challenges in T AD, namely, decay of temporal context due to recursive processing and self-element conflict during global visual context modeling, which become more severe while handling long-span action instances. This paper presents MambaT AD, a new state-space T AD model that introduces long-range modeling and global feature detection capabilities for accurate temporal action detection. MambaT AD comprises two novel designs that complement each other with superior T AD performance. First, it introduces a Diagonal-Masked Bidirectional State-Space (DMBSS) module which effectively facilitates global feature fusion and temporal action detection. Second, it introduces a global feature fusion head that refines the detection progressively with multi-granularity features and global awareness. In addition, MambaT AD tackles T AD in an end-to-end one-stage manner using a new state-space temporal adapter(SST A) which reduces network parameters and computation cost with linear complexity. Extensive experiments show that MambaT AD achieves superior T AD performance consistently across multiple public benchmarks. Emporal action detection (T AD) aims to detect specific action categories and extract corresponding temporal spans in untrimmed videos. It is a long-standing and challenging problem in video understanding with extensive real-world applications such as sports analysis, surveillance and security. The development of deep neural networks such as CNNs [1], [2] and Transformers [3], [4] has led to continuous advancements in T AD performance over the past few years. However, CNNs have limited capabilities in capturing long-range dependencies, while Transformers face challenges with computational complexity and feature discrimination [1]. Hui Lu and Yi Y u are with the Rapid-Rich Object Search Lab, Interdisciplinary Graduate Programme, Nanyang Technological University, Singapore, (e-mail: {hui007, yuyi0010}@e.ntu.edu.sg).

action detection, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.17929

Country:

Europe > Austria (0.28)
Asia > Singapore (0.24)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

VideoCapsuleNet: A Simplified Network for Action Detection

Kevin Duarte, Yogesh Rawat, Mubarak Shah

Neural Information Processing SystemsNov-20-2025, 17:26:49 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, capsule, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > Canada > Quebec > Montreal (0.04)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

A flexible model for training action localization with varying levels of supervision

Guilhem Chéron, Jean-Baptiste Alayrac, Ivan Laptev, Cordelia Schmid

Neural Information Processing SystemsNov-20-2025, 16:28:53 GMT

Spatio-temporal action detection in videos is typically addressed in a fully-supervised setup with manual annotation of training videos required at every frame.

artificial intelligence, machine learning, supervision, (18 more...)

Neural Information Processing Systems

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

184260348236f9554fe9375772ff966e-Reviews.html

Neural Information Processing SystemsOct-3-2025, 07:07:01 GMT

"NIPS 2013 Neural Information Processing Systems December 5 - 10, Lake Tahoe, Nevada, USA",,, "Paper ID:","1139" "Title:","Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization" Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes a method for action detection (localization and classification of actions) using weakly supervised information (action labels + eye gaze information, no explicit definition of bounding boxes). Overall, the spatio-temporal search (a huge spatio-temporal space) is done using dynamic programming and a max-path algorithm. Gaze information is introduced into framework through a loss which acounts for gaze density at a given location. QUALITY: The paper seems technically sound and makes for a nice study given gaze information.

action localization, information, localization, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Nevada (0.24)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback